Cornell College
STA 200 Fall 2025 Block 1
The table below shows the distribution of patients with good outcomes at 6-month follow-up. Note that 7 patients dropped out of the study: 3 from the treatment and 4 from the control group.
| Group | Yes | No | Total |
|---|---|---|---|
| Treatment | 19 | 8 | 27 |
| Control | 5 | 21 | 26 |
| Total | 24 | 29 | 53 |
Proportion with good outcomes in treatment group:
\[ \tfrac{19}{27} \approx 0.70 \;\rightarrow\; 70\% \]
Proportion with good outcomes in control group:
\[ \tfrac{5}{26} \approx 0.19 \;\rightarrow\; 19\% \]
Do the data show a “real” difference between the groups?
Are the results of this study generalizable to all patients with chronic fatigue syndrome?
These patients had specific characteristics and volunteered to be a part of this study, therefore they may not be representative of all patients with chronic fatigue syndrome. While we cannot immediately generalize the results to all patients, this first study is encouraging. The method works for patients with some narrow set of characteristics, and that gives hope that it will work, at least to some degree, with other patients.
A survey was conducted on students in an introductory statistics course. Below are a few of the questions on the survey, and the corresponding variables the data from the responses were stored in:
gender: What is your gender?intro_extra: Do you consider yourself introverted or extraverted?sleep: How many hours do you sleep at night, on average?bedtime: What time do you usually go to bed?countries: How many countries have you visited?dread: On a scale of 1-5, how much do you dread being here?Data collected on students in a statistics class on a variety of variables:
| Stu. | gender | intro_extra | … | dread |
|---|---|---|---|---|
| 1 | male | extravert | … | 3 |
| 2 | female | extravert | … | 2 |
| 3 | female | introvert | … | 4 ⟵ observation |
| 4 | female | extravert | … | 2 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 86 | male | extravert | … | 3 |
Variables ->
Observations ↓
| gender | sleep | bedtime | countries | dread | |
|---|---|---|---|---|---|
| 1 | male | 5 | 12–2 | 13 | 3 |
| 2 | female | 7 | 10–12 | 7 | 2 |
| 3 | female | 5.5 | 12–2 | 1 | 4 |
| 4 | female | 7 | 12–2 | 2 | |
| 5 | female | 3 | 12–2 | 1 | 3 |
| 6 | female | 3 | 12–2 | 9 | 4 |
gender:Solution
categorical
sleep:Solution
numerical, continuous
| gender | sleep | bedtime | countries | dread | |
|---|---|---|---|---|---|
| 1 | male | 5 | 12–2 | 13 | 3 |
| 2 | female | 7 | 10–12 | 7 | 2 |
| 3 | female | 5.5 | 12–2 | 1 | 4 |
| 4 | female | 7 | 12–2 | 2 | |
| 5 | female | 3 | 12–2 | 1 | 3 |
| 6 | female | 3 | 12–2 | 9 | 4 |
bedtime:Solution
categorical, ordinal
countries:Solution
numerical, discrete
dread:Solution
categorical, ordinal — could also be used as numerical
Practice question
What type of variable is a telephone area code?
Answer
Question
Does there appear to be a relationship between GPA and number of hours students study per week?
Question
Can you spot anything unusual about any of the data points?
Solution
There is one student with GPA > 4.0 — this is likely a data error.
To identify the explanatory variable in a pair of variables, identify which of the two is suspected of affecting the other:
explanatory variable → response variable
Labeling variables as explanatory and response does not guarantee the relationship between the two is actually causal, even if there is an association identified between the two variables. We use these labels only to keep track of which variable we suspect affects the other.
Practice question
Based on the scatterplot on the right, which of the following statements is correct about the head and skull lengths of possums?
Research question: Can people become better, more efficient runners on their own, merely by running?
Population of interest:
Answer
All people
Sample: Group of adult women who recently joined a running group
Population to which results can be generalized:
Answer
Adult women, if the data are randomly sampled
Non-response: If only a small fraction of the randomly sampled people respond, the sample may not be representative.
Voluntary response: When people with strong opinions self-select into the sample, it’s not representative.
cnn.com, Jan 14, 2012
In 1936, Landon sought the Republican presidential nomination opposing FDR’s re-election.
Note
A school district is considering banning high school student parking after two accidents. Parents are surveyed by mail. Of 6,000 surveys, 1,200 returned: 960 agree, 240 disagree. Which statements are true?
I. Some mailings may never have reached parents.
II. The district has strong support from parents to move forward.
III. It’s possible a majority of parents disagree.
IV. Results are unlikely to be biased because all parents were mailed.
Answer choices:
a. Only I
b. I and II
c. I and III
d. III and IV
e. Only IV
Answer
Randomly select cases from the population, with no implied connection between selected points.
Strata = groups of similar observations. Take a simple random sample from each stratum.
Clusters are usually heterogeneous. Randomly sample clusters, then include all observations in them. Often chosen for cost reasons.
Clusters sampled first, then take a simple random sample within those clusters.
Note
A city council requests a household survey in a suburban area with varied neighborhoods. Which approach would be least effective?
Answer
We would like to design an experiment to investigate if energy gels make you run faster:
It is suspected that gels might affect pros and amateurs differently, so we block for pro status:
Why is this important? Can you think of other variables to block for?
Note
A study tests the effect of light level and noise level on exam performance. The researcher suspects gender might moderate effects and wants equal gender representation in each group. Which is correct?
Answer
Note
What is the main difference between observational studies and experiments?
Answer